A Multilevel Bayesian Model of Categorical Data Annotation

ثبت نشده
چکیده

This paper demonstrates the utility of multilevel Bayesian models of data annotation for classifiers. The observable data is the set of categorizations of items by annotators from which data may be missing at random or may be replicated. Estimated individuallevel parameters include category prevalence, the “true” category of each item, and the accuracy in terms of sensitivity and specificity of each annotator. The multilevel parameters represent average annotator performance and variance. Samples from the posterior category distribution may be used for probabilistic supervision and evaluation of classifiers, as well as in gold-standard adjudication and active learning. We demonstrate the effectiveness of our approach with simulated data and two real data sets (RTE-1 and MUC-6).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilevel Bayesian Models of Categorical Data Annotation

Abstract This paper demonstrates the utility of multilevel Bayesian models of data annotation for classifiers (also known as coding or rating). The observable data is the set of categorizations of items by annotators (also known as raters or coders) from which data may be missing at random or may be replicated (that is, it handles fixed panel and varying panel designs). Estimated model paramete...

متن کامل

The Analysis of Ordered Categorical Data: An Overview and a Survey of Recent Developments

This article reviews methodologies used for analyzing ordered categorical (ordinal) response variables. We begin by surveying models for data with a single ordinal response variable. We also survey recently proposed strategies for modeling ordinal response variables when the data have some type of clustering or when repeated measurement occurs at various occasions for each subject, such as in l...

متن کامل

Redundant Overdispersion Parameters in Multilevel Models for Categorical Responses

In some distributions, such as the binomial distribution, the variance is determined by the mean. However, in practice, overdispersion is often observed where the variance is larger than that predicated by the mean, and underdispersion is sometimes observed where the variance is smaller. It is well known that overdispersion or underdispersion cannot be modeled for dichotomous responses having a...

متن کامل

ProbMetab: an R package for Bayesian probabilistic annotation of LC–MS-based metabolomics

We present ProbMetab, an R package that promotes substantial improvement in automatic probabilistic liquid chromatography-mass spectrometry-based metabolome annotation. The inference engine core is based on a Bayesian model implemented to (i) allow diverse source of experimental data and metadata to be systematically incorporated into the model with alternative ways to calculate the likelihood ...

متن کامل

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical cou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008